Query-log mining for detecting polysemy and spam
نویسندگان
چکیده
Through their interaction with search engines, users provide implicit feedback that can be used to extract useful knowledge and improve the quality of the search process. This feedback is encoded in the form of a query log that consists of a sequence of search actions, which contain information about submitted queries, documents viewed, and documents clicked by the users. In this paper, we propose characterizing documents and queries via the information available within a query log, with the goal of detecting either query polysemy or spam-hosts and spam-queries, i.e., queries that shown the undesirable property of showing a higher rate of spam pages in their list of results than other queries. The main contribution of our paper consists of exploiting user feedback and query-log mining to combat spam and identify query polysemy. Our experiments attest the effectiveness of our approach for the applications
منابع مشابه
Discovering Popular Clicks\' Pattern of Teen Users for Query Recommendation
Search engines are still the most important gates for information search in internet. In this regard, providing the best response in the shortest time possible to the user's request is still desired. Normally, search engines are designed for adults and few policies have been employed considering teen users. Teen users are more biased in clicking the results list than are adult users. This leads...
متن کاملA Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...
متن کاملAnalysis of User query refinement behavior based on semantic features: user log analysis of Ganj database (IranDoc)
Background and Aim: Information systems cannot be well designed or developed without a clear understanding of needs of users, manner of their information seeking and evaluating. This research has been designed to analyze the Ganj (Iranian research institute of science and technology database) users’ query refinement behaviors via log analysis. Methods: The method of this research is log anal...
متن کامل"In vivo" spam filtering: A challenge problem for data mining
Spam, also known as Unsolicited Commercial Email (UCE), is the bane of email communication. Many data mining researchers have addressed the problem of detecting spam, generally by treating it as a static text classification problem. True in vivo spam filtering has characteristics that make it a rich and challenging domain for data mining. Indeed, real-world datasets with these characteristics a...
متن کاملApplication of Bayesian decision making tool in detecting oil-water contact in a carbonate reservoir
Detection of Oil-Water Contacts (OWCs) is one of the primary tasks before evaluation of reservoir’s hydrocarbon in place, determining net pay zones and suitable depths for perforation operation. This paper introduces Bayesian decision making tool as an effective technique in OWC detecting using wire line logs. To compare strengths of the suggested method in detecting OWC with conventional one, ...
متن کامل